Analysis of GLDS-121 from NASA GeneLab

This R markdown file was auto-generated by the iDEP website Using iDEP 0.91, originally by Steven

Ge SX, Son EW, Yao R: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018, 19(1):534. PMID:30567491

1. Read data

First we set up the working directory to where the files are saved.

 setwd('~/Documents/HTML_R/GLDS121')

R packages and iDEP core Functions. Users can also download the iDEP_core_functions.R file. Many R packages needs to be installed first. This may take hours. Each of these packages took years to develop.So be a patient thief. Sometimes dependencies needs to be installed manually. If you are using an older version of R, and having trouble with package installation, try un-install the current version of R, delete all folders and files (C:/Program Files/R/R-3.4.3), and reinstall from scratch.

 if(file.exists('iDEP_core_functions.R'))
    source('iDEP_core_functions.R') else 
    source('https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/iDEP_core_functions.R') 

We are using the downloaded gene expression file where gene IDs has been converted to Ensembl gene IDs. This is because the ID conversion database is too large to download. You can use your original file if your file uses Ensembl ID, or you do not want to use the pathway files available in iDEP (or it is not available).

 inputFile <- 'GLDS121_Expression.csv'
 sampleInfoFile <- 'GLDS121_Sampleinfo.csv'
 gldsMetadataFile <- 'GLDS121_Metadata.csv'
 geneInfoFile <- 'Arabidopsis_thaliana__athaliana_eg_gene_GeneInfo.csv' #Gene symbols, location etc. 
 geneSetFile <- 'Arabidopsis_thaliana__athaliana_eg_gene.db'  # pathway database in SQL; can be GMT format 
 STRING10_speciesFile <- 'https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/STRING10_species.csv' 

Parameters for reading data

 input_missingValue <- 'geneMedian' #Missing values imputation method
 input_dataFileFormat <- 1  #1- read counts, 2 FKPM/RPKM or DNA microarray
 input_minCounts <- 0.5 #Min counts
 input_NminSamples <- 1 #Minimum number of samples 
 input_countsLogStart <- 4  #Pseudo count for log CPM
 input_CountsTransform <- 1 #Methods for data transformation of counts. 1-EdgeR's logCPM 2-VST, 3-rlog 
readMetadata.out <- readMetadata(gldsMetadataFile)
library(knitr)   #  install if needed. for showing tables with kable
library(kableExtra)
kable( readMetadata.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%")
FLT_Rep1 FLT_Rep2 FLT_Rep3 GC_Rep1 GC_Rep2 GC_Rep3
Sample.LongId Atha.Ler.0.sShoots.FLT.Rep1.Array Atha.Ler.0.sShoots.FLT.Rep2.Array Atha.Ler.0.sShoots.FLT.Rep3.Array tha.Ler.0.sShoots.GC.Rep1 tha.Ler.0.sShoots.GC.Rep2 tha.Ler.0.sShoots.GC.Rep3
Sample.Id Atha.Ler.0.sShoots.FLT.Rep1 Atha.Ler.0.sShoots.FLT.Rep2 Atha.Ler.0.sShoots.FLT.Rep3 Atha.Ler.0.sShoots.GC.Rep1 Atha.Ler.0.sShoots.GC.Rep2 Atha.Ler.0.sShoots.GC.Rep3
Sample.Name Atha_Ler-0_sShoots_FLT_Rep1 Atha_Ler-0_sShoots_FLT_Rep2 Atha_Ler-0_sShoots_FLT_Rep3 Atha_Ler-0_sShoots_GC_Rep1 Atha_Ler-0_sShoots_GC_Rep2 Atha_Ler-0_sShoots_GC_Rep3
GLDS 121 121 121 121 121 121
Accession GLDS-121 GLDS-121 GLDS-121 GLDS-121 GLDS-121 GLDS-121
Hardware BRIC BRIC BRIC BRIC BRIC BRIC
Tissue Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling
Age 14 days 14 days 14 days 14 days 14 days 14 days
Organism Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana
Ecotype Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0
Genotype WT WT WT WT WT WT
Variety Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT
Radiation Cosmic radiation Cosmic radiation Cosmic radiation Background Earth Background Earth Background Earth
Gravity Microgravity Microgravity Microgravity Terrestrial Terrestrial Terrestrial
Developmental Etiolated 14 day old seelings Etiolated 14 day old seelings Etiolated 14 day old seelings Etiolated 14 day old seelings Etiolated 14 day old seelings Etiolated 14 day old seelings
Time.series.or.Concentration.gradient Single time point Single time point Single time point Single time point Single time point Single time point
Light Dark Dark Dark Dark Dark Dark
Assay..RNAseq. Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling
Temperature Ambient shuttle Ambient shuttle Ambient shuttle Ambient shuttle Ambient shuttle Ambient shuttle
Treatment.type Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry Biological Research in Canisters16 (BRIC16): Investigations of the plant cytoskeleton in microgravity with gene profiling and cytochemistry
Treatment.intensity Full Flight Full Flight Full Flight Full Flight Full Flight Full Flight
Treament.timing Full Flight Full Flight Full Flight Full Flight Full Flight Full Flight
Preservation.Method. RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater
 readData.out <- readData(inputFile) 
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
   kable( head(readData.out$data) ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
FLT_Rep1 FLT_Rep2 FLT_Rep3 GC_Rep1 GC_Rep2 GC_Rep3
AT5G52310 2.807355 2.584963 2.807355 3.807355 3.700440 3.807355
AT5G66780 2.584963 2.584963 3.169925 2.584963 3.700440 3.807355
AT3G01570 2.321928 2.321928 2.807355 2.321928 3.584963 3.700440
AT5G03860 2.321928 2.321928 2.807355 2.321928 3.459432 3.584963
AT1G02700 2.584963 2.584963 2.807355 2.807355 3.584963 3.700440
AT2G25890 2.584963 2.584963 2.807355 2.584963 3.584963 3.584963
 readSampleInfo.out <- readSampleInfo(sampleInfoFile) 
 kable( readSampleInfo.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Gravity
FLT_Rep1 Microgravity
FLT_Rep2 Microgravity
FLT_Rep3 Microgravity
GC_Rep1 Terrestrial
GC_Rep2 Terrestrial
GC_Rep3 Terrestrial
 input_selectOrg ="NEW" 
 input_selectGO <- NULL     #Gene set category 
 input_noIDConversion = TRUE  
 allGeneInfo.out <- geneInfo(geneInfoFile) 
 converted.out = NULL 
 convertedData.out <- convertedData()    
 nGenesFilter()  
## [1] "16156 genes in 6 samples. 16156  genes passed filter.\n Original gene IDs used."
 convertedCounts.out <- convertedCounts()  # converted counts, just for compatibility 

2. Pre-process

# Read counts per library 
 parDefault = par() 
 par(mar=c(12,4,2,2)) 
 # barplot of total read counts
 x <- readData.out$rawCounts
 groups = as.factor( detectGroups(colnames(x ) ) )
 if(nlevels(groups)<=1 | nlevels(groups) >20 )  
  col1 = 'green'  else
  col1 = rainbow(nlevels(groups))[ groups ]             
         
 barplot( colSums(x)/1e6, 
        col=col1,las=3, main="Total read counts (millions)")  

 readCountsBias()  # detecting bias in sequencing depth 
## [1] 0.01354819
## [1] 0.01354819
## [1] "Warning! Sequencing depth bias detected. Total read counts are significantly different among sample groups (p= 1.35e-02 ) based on ANOVA.  Total read counts seem to be correlated with factor Gravity (p= 1.35e-02 ).  "
 # Box plot 
 x = readData.out$data 
 boxplot(x, las = 2, col=col1,
    ylab='Transformed expression levels',
    main='Distribution of transformed data') 

 #Density plot 
 par(parDefault) 
## Warning in par(parDefault): graphical parameter "cin" cannot be set
## Warning in par(parDefault): graphical parameter "cra" cannot be set
## Warning in par(parDefault): graphical parameter "csi" cannot be set
## Warning in par(parDefault): graphical parameter "cxy" cannot be set
## Warning in par(parDefault): graphical parameter "din" cannot be set
## Warning in par(parDefault): graphical parameter "page" cannot be set
 densityPlot()       

 # Scatter plot of the first two samples 
 plot(x[,1:2],xlab=colnames(x)[1],ylab=colnames(x)[2], 
    main='Scatter plot of first two samples') 

 ####plot gene or gene family
 input_selectOrg ="BestMatch" 
 input_geneSearch <- 'HOXA' #Gene ID for searching 
 genePlot()  
## NULL
 input_useSD <- 'FALSE' #Use standard deviation instead of standard error in error bar? 
 geneBarPlotError()       
## NULL

3. Heatmap

 # hierarchical clustering tree
 x <- readData.out$data
 maxGene <- apply(x,1,max)
 # remove bottom 25% lowly expressed genes, which inflate the PPC
 x <- x[which(maxGene > quantile(maxGene)[1] ) ,] 
 plot(as.dendrogram(hclust2( dist2(t(x)))), ylab="1 - Pearson C.C.", type = "rectangle") 

 #Correlation matrix
 input_labelPCC <- TRUE #Show correlation coefficient? 
 correlationMatrix() 

 # Parameters for heatmap
 input_nGenes <- 1000   #Top genes for heatmap
 input_geneCentering <- TRUE    #centering genes ?
 input_sampleCentering <- FALSE #Center by sample?
 input_geneNormalize <- FALSE   #Normalize by gene?
 input_sampleNormalize <- FALSE #Normalize by sample?
 input_noSampleClustering <- FALSE  #Use original sample order
 input_heatmapCutoff <- 4   #Remove outliers beyond number of SDs 
 input_distFunctions <- 1   #which distant funciton to use
 input_hclustFunctions <- 1 #Linkage type
 input_heatColors1 <- 1 #Colors
 input_selectFactorsHeatmap <- 'Gravity'    #Sample coloring factors 
 png('heatmap.png', width = 10, height = 15, units = 'in', res = 300) 
 staticHeatmap() 
 dev.off()  
## png 
##   2

[heatmap] (heatmap.png)

 heatmapPlotly() # interactive heatmap using Plotly 

4. K-means clustering

 input_nGenesKNN <- 2000    #Number of genes fro k-Means
 input_nClusters <- 4   #Number of clusters 
 maxGeneClustering = 12000
 input_kmeansNormalization <- 'geneMean'    #Normalization
 input_KmeansReRun <- 0 #Random seed 

 distributionSD()  #Distribution of standard deviations 

 KmeansNclusters()  #Number of clusters 

 Kmeans.out = Kmeans()   #Running K-means 
 KmeansHeatmap()   #Heatmap for k-Means 

 #Read gene sets for enrichment analysis 
 sqlite  <- dbDriver('SQLite')
 input_selectGO3 <- 'GOBP'  #Gene set category
 input_minSetSize <- 15 #Min gene set size
 input_maxSetSize <- 2000   #Max gene set size 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO3,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 # Alternatively, users can use their own GMT files by
 #GeneSets.out <- readGMTRobust('somefile.GMT')  
 results <- KmeansGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 1.04e-19 111 Small molecule metabolic process
1.44e-17 118 Response to abiotic stimulus
3.56e-17 93 Oxidation-reduction process
3.56e-17 58 Cell wall organization or biogenesis
1.01e-16 37 Photosynthesis
2.82e-16 61 Drug metabolic process
2.82e-16 49 Cell wall organization
2.82e-16 100 Organonitrogen compound biosynthetic process
2.19e-15 49 External encapsulating structure organization
3.38e-15 70 Carbohydrate metabolic process
B 4.19e-19 46 Cell wall organization or biogenesis
1.12e-17 39 Cell wall organization
5.06e-17 39 External encapsulating structure organization
4.64e-15 46 Anatomical structure morphogenesis
7.68e-15 43 Drug metabolic process
7.41e-12 21 Drug catabolic process
1.03e-11 45 Carbohydrate metabolic process
1.50e-11 23 Carbohydrate catabolic process
8.42e-11 57 Catabolic process
1.38e-10 17 Polysaccharide catabolic process
C 8.49e-68 154 Response to abiotic stimulus
1.48e-58 126 Response to oxygen-containing compound
7.22e-50 102 Response to acid chemical
5.04e-44 121 Response to organic substance
2.95e-41 69 Response to abscisic acid
4.20e-41 69 Response to alcohol
2.84e-40 46 Cellular response to hypoxia
2.95e-40 46 Cellular response to decreased oxygen levels
2.95e-40 46 Cellular response to oxygen levels
7.88e-39 104 Response to hormone
D 3.04e-07 41 Oxidation-reduction process
4.86e-07 50 Response to abiotic stimulus
2.22e-06 38 Reproduction
2.22e-06 38 Reproductive process
2.22e-06 39 Cellular response to chemical stimulus
2.32e-06 33 Regulation of biological quality
6.57e-06 33 Developmental process involved in reproduction
6.57e-06 49 Multicellular organism development
6.57e-06 35 Post-embryonic development
8.44e-06 41 Small molecule metabolic process
 input_seedTSNE <- 0    #Random seed for t-SNE
 input_colorGenes <- TRUE   #Color genes in t-SNE plot? 
 tSNEgenePlot()  #Plot genes using t-SNE 

5. PCA and beyond

 input_selectFactors <- 'Sample_Name'  
 input_selectFactors2 <- 'Sample_Name' 
 input_tsneSeed2 <- 0   #Random seed for t-SNE 
 #PCA, MDS and t-SNE plots
 PCAplot()  

 MDSplot() 

 tSNEplot()  

 #Read gene sets for pathway analysis using PGSEA on principal components 
 input_selectGO6 <- 'GOBP' 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO6,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 PCApathway() # Run PGSEA analysis 
## Warning: Package 'KEGG.db' is deprecated and will be removed from Bioconductor
##   version 3.12

 cat( PCA2factor() )   #The correlation between PCs with factors 
## 
##  Correlation between Principal Components (PCs) with factors
## PC1 is correlated with Gravity (p=4.45e-03).

6. DEG1

 input_CountsDEGMethod <- 2 #DESeq2= 3,limma-voom=2,limma-trend=1 
 input_limmaPval <- 0.1 #FDR cutoff
 input_limmaFC <- 2 #Fold-change cutoff
 input_selectModelComprions <- 'Gravity: Microgravity vs. Terrestrial'  #Selected comparisons
 input_selectFactorsModel <- 'Gravity'  #Selected comparisons
 input_selectInteractions <- NULL   #Selected comparisons
 input_selectBlockFactorsModel <- NULL  #Selected comparisons
 factorReferenceLevels.out <- c('Gravity:Microgravity') 

 limma.out <- limma()
 DEG.data.out <- DEG.data()
 limma.out$comparisons 
## [1] "GC-FLT"
 input_selectComparisonsVenn = limma.out$comparisons[1:3] # use first three comparisons
 input_UpDownRegulated <- FALSE #Split up and down regulated genes 
 vennPlot() # Venn diagram 

  sigGeneStats() # number of DEGs as figure 

  sigGeneStatsTable() # number of DEGs as table 
##        Comparisons Up Down
## GC-FLT      GC-FLT  7   21

7. DEG2

 input_selectContrast <- 'Terrestrial-Microgravity' #Selected comparisons 
 selectedHeatmap.data.out <- selectedHeatmap.data()
 selectedHeatmap()   # heatmap for DEGs in selected comparison
## Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
 # Save gene lists and data into files
 write.csv( selectedHeatmap.data()$genes, 'heatmap.data.csv') 
 write.csv(DEG.data(),'DEG.data.csv' )
 write(AllGeneListsGMT() ,'AllGeneListsGMT.gmt')
 input_selectGO2 <- 'GOBP'  #Gene set category 
 geneListData.out <- geneListData()  
 volcanoPlot()  

  scatterPlot()  
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
  MAplot()  
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
  geneListGOTable.out <- geneListGOTable()  
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO2,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_removeRedudantSets <- TRUE   #Remove highly redundant gene sets? 
 results <- geneListGO()  #Enrichment analysis
## Error in if (dim(results1)[2] == 1) return(results1) else {: argument is of length zero
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 1.04e-19 111 Small molecule metabolic process
1.44e-17 118 Response to abiotic stimulus
3.56e-17 93 Oxidation-reduction process
3.56e-17 58 Cell wall organization or biogenesis
1.01e-16 37 Photosynthesis
2.82e-16 61 Drug metabolic process
2.82e-16 49 Cell wall organization
2.82e-16 100 Organonitrogen compound biosynthetic process
2.19e-15 49 External encapsulating structure organization
3.38e-15 70 Carbohydrate metabolic process
B 4.19e-19 46 Cell wall organization or biogenesis
1.12e-17 39 Cell wall organization
5.06e-17 39 External encapsulating structure organization
4.64e-15 46 Anatomical structure morphogenesis
7.68e-15 43 Drug metabolic process
7.41e-12 21 Drug catabolic process
1.03e-11 45 Carbohydrate metabolic process
1.50e-11 23 Carbohydrate catabolic process
8.42e-11 57 Catabolic process
1.38e-10 17 Polysaccharide catabolic process
C 8.49e-68 154 Response to abiotic stimulus
1.48e-58 126 Response to oxygen-containing compound
7.22e-50 102 Response to acid chemical
5.04e-44 121 Response to organic substance
2.95e-41 69 Response to abscisic acid
4.20e-41 69 Response to alcohol
2.84e-40 46 Cellular response to hypoxia
2.95e-40 46 Cellular response to decreased oxygen levels
2.95e-40 46 Cellular response to oxygen levels
7.88e-39 104 Response to hormone
D 3.04e-07 41 Oxidation-reduction process
4.86e-07 50 Response to abiotic stimulus
2.22e-06 38 Reproduction
2.22e-06 38 Reproductive process
2.22e-06 39 Cellular response to chemical stimulus
2.32e-06 33 Regulation of biological quality
6.57e-06 33 Developmental process involved in reproduction
6.57e-06 49 Multicellular organism development
6.57e-06 35 Post-embryonic development
8.44e-06 41 Small molecule metabolic process

STRING-db API access. We need to find the taxonomy id of your species, this used by STRING. First we try to guess the ID based on iDEP’s database. Users can also skip this step and assign NCBI taxonomy id directly by findTaxonomyID.out = 10090 # mouse 10090, human 9606 etc.

 STRING10_species = read.csv(STRING10_speciesFile)  
 ix = grep('Arabidopsis thaliana', STRING10_species$official_name ) 
 findTaxonomyID.out <- STRING10_species[ix,1] # find taxonomyID
 findTaxonomyID.out  
## [1] 3702

Enrichment analysis using STRING

  STRINGdb_geneList.out <- STRINGdb_geneList() #convert gene lists
 input_STRINGdbGO <- 'Process'  #'Process', 'Component', 'Function', 'KEGG', 'Pfam', 'InterPro' 
 results <- stringDB_GO_enrichmentData()  # enrichment using STRING 
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
x
NULL

PPI network retrieval and analysis

 input_nGenesPPI <- 100 #Number of top genes for PPI retrieval and analysis 
 stringDB_network1(1) #Show PPI network 

Generating interactive PPI

 write(stringDB_network_link(), 'PPI_results.html') # write results to html file 
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")

## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")

## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
 browseURL('PPI_results.html') # open in browser 

8. Pathway analysis

 input_selectContrast1 = limma.out$comparisons[1] 
 #input_selectContrast1 = limma.out$comparisons[3] # manually set 
 input_selectGO = 'GOBP'  # gene set category 
 #input_selectGO='custom' # if custom gmt file
 input_minSetSize <- 15 #Min size for gene set
 input_maxSetSize <- 2000   #Max size for gene set 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_pathwayPvalCutoff <- 0.2 #FDR cutoff
 input_nPathwayShow <- 30   #Top pathways to show
 input_absoluteFold <- FALSE    #Use absolute values of fold-change?
 input_GenePvalCutoff <- 1  #FDR to remove genes 

 input_pathwayMethod = 1  # 1  GAGE
 gagePathwayData.out <- gagePathwayData()  # pathway analysis using GAGE  
   
 results <- gagePathwayData.out  #Enrichment analysis for k-Means clusters  
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GAGE analysis: GC vs FLT statistic Genes adj.Pval
Down Cell wall organization or biogenesis -5.5059 404 4.7e-05
Cell wall organization -5.1529 318 1.6e-04
Microtubule-based process -4.9086 141 4.1e-04
Galacturonan metabolic process -4.8483 101 4.1e-04
Pectin metabolic process -4.8448 100 4.1e-04
External encapsulating structure organization -4.8196 336 4.1e-04
Polysaccharide metabolic process -4.6216 274 6.4e-04
Microtubule-based movement -4.6108 49 1.2e-03
Plant-type cell wall organization or biogenesis -4.5785 173 7.8e-04
Cell division -4.4975 240 9.1e-04
Movement of cell or subcellular component -4.4775 70 1.4e-03
DNA replication -4.2756 119 2.0e-03
Plant-type cell wall organization -4.2402 103 2.3e-03
Drug metabolic process -4.2353 459 2.0e-03
DNA-dependent DNA replication -4.203 102 2.5e-03
Cytoskeleton organization -3.9662 165 5.3e-03
Mitotic cell cycle process -3.8709 166 7.3e-03
Up Response to drug 5.3506 337 1.1e-04
Response to chitin 4.9326 86 7.9e-04
Response to hypoxia 4.6985 158 7.9e-04
Response to oxygen levels 4.6481 161 7.9e-04
Response to decreased oxygen levels 4.6272 160 7.9e-04
Cellular response to hypoxia 4.6131 145 7.9e-04
Cellular response to decreased oxygen levels 4.5888 146 7.9e-04
Cellular response to oxygen levels 4.5888 146 7.9e-04
Response to temperature stimulus 4.458 348 1.0e-03
Response to organonitrogen compound 4.404 157 1.4e-03
Response to hydrogen peroxide 4.024 52 9.6e-03
Response to heat 3.8951 133 9.9e-03
Response to water 3.79 235 1.2e-02
 pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
 input_pathwayMethod = 3  # 1  fgsea 
 fgseaPathwayData.out <- fgseaPathwayData() #Pathway analysis using fgsea 
## Warning in fgsea(pathways = gmt, stats = fold, minSize = input_minSetSize, :
## You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To
## run fgseaMultilevel, you need to remove the nperm argument in the fgsea function
## call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (92.25% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
 results <- fgseaPathwayData.out  #Enrichment analysis for k-Means clusters 
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GSEA analysis: GC vs FLT NES Genes adj.Pval
Down Pectin metabolic process -2.5234 100 7.4e-03
Galacturonan metabolic process -2.5189 101 7.4e-03
Cell wall organization -2.4187 318 7.4e-03
Cell wall organization or biogenesis -2.3668 404 7.4e-03
Plant-type cell wall organization or biogenesis -2.3524 173 7.4e-03
External encapsulating structure organization -2.324 336 7.4e-03
Plant-type cell wall organization -2.3121 103 7.4e-03
Microtubule-based movement -2.2685 49 7.4e-03
DNA replication -2.2297 119 7.4e-03
Microtubule-based process -2.2214 141 7.4e-03
DNA-dependent DNA replication -2.2187 102 7.4e-03
Pectin catabolic process -2.2161 40 1.1e-02
Pectin biosynthetic process -2.2124 30 1.2e-02
Movement of cell or subcellular component -2.1949 70 7.4e-03
Polysaccharide catabolic process -2.1569 83 7.4e-03
Cell cycle DNA replication -2.1547 49 7.4e-03
Up Response to hydrogen peroxide 2.7682 52 1.1e-02
Response to heat 2.5673 133 1.2e-02
Cellular response to hypoxia 2.5558 145 1.2e-02
Cellular response to decreased oxygen levels 2.551 146 1.2e-02
Cellular response to oxygen levels 2.551 146 1.2e-02
Response to hypoxia 2.5382 158 1.2e-02
Response to decreased oxygen levels 2.5225 160 1.2e-02
Response to oxygen levels 2.5203 161 1.2e-02
Response to chitin 2.4954 86 1.2e-02
Response to reactive oxygen species 2.4945 101 1.2e-02
Response to temperature stimulus 2.3753 348 1.5e-02
Response to water 2.2412 235 1.3e-02
Response to water deprivation 2.2077 231 1.3e-02
Regulation of DNA-templated transcription in response to stress 2.2052 21 1.1e-02
  pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

   PGSEAplot() # pathway analysis using PGSEA 
## Error in findContrastSamples(input_selectContrast1, colnames(convertedData.out), : object 'c.out' not found

9. Chromosome

 input_selectContrast2 = limma.out$comparisons[1] 
 #input_selectContrast2 = limma.out$comparisons[3] # manually set
 input_limmaPvalViz <- 0.1  #FDR to filter genes
 input_limmaFCViz <- 2  #FDR to filter genes 
 genomePlotly() # shows fold-changes on the genome 
## Warning in eval(quote(list(...)), env): NAs introduced by coercion
## Warning in genomePlotly(): NAs introduced by coercion

10. Biclustering

 input_nGenesBiclust <- 1000    #Top genes for biclustering
 input_biclustMethod <- 'BCCC()'    #Method: 'BCCC', 'QUBIC', 'runibic' ... 
 biclustering.out = biclustering()  # run analysis

 input_selectBicluster <- NULL  #select a cluster 
 biclustHeatmap()   # heatmap for selected cluster 
## Error in res[[i]] <- x[BicRes@RowxNumber[, number[i]], BicRes@NumberxCol[number[i], : attempt to select less than one element in integerOneIndex
 input_selectGO4 = 'GOBP'  # gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO4,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 results <- geneListBclustGO()  #Enrichment analysis for k-Means clusters   
## Error in res[[i]] <- x[BicRes@RowxNumber[, number[i]], BicRes@NumberxCol[number[i], : attempt to select less than one element in integerOneIndex
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GSEA analysis: GC vs FLT NES Genes adj.Pval
Down Pectin metabolic process -2.5234 100 7.4e-03
Galacturonan metabolic process -2.5189 101 7.4e-03
Cell wall organization -2.4187 318 7.4e-03
Cell wall organization or biogenesis -2.3668 404 7.4e-03
Plant-type cell wall organization or biogenesis -2.3524 173 7.4e-03
External encapsulating structure organization -2.324 336 7.4e-03
Plant-type cell wall organization -2.3121 103 7.4e-03
Microtubule-based movement -2.2685 49 7.4e-03
DNA replication -2.2297 119 7.4e-03
Microtubule-based process -2.2214 141 7.4e-03
DNA-dependent DNA replication -2.2187 102 7.4e-03
Pectin catabolic process -2.2161 40 1.1e-02
Pectin biosynthetic process -2.2124 30 1.2e-02
Movement of cell or subcellular component -2.1949 70 7.4e-03
Polysaccharide catabolic process -2.1569 83 7.4e-03
Cell cycle DNA replication -2.1547 49 7.4e-03
Up Response to hydrogen peroxide 2.7682 52 1.1e-02
Response to heat 2.5673 133 1.2e-02
Cellular response to hypoxia 2.5558 145 1.2e-02
Cellular response to decreased oxygen levels 2.551 146 1.2e-02
Cellular response to oxygen levels 2.551 146 1.2e-02
Response to hypoxia 2.5382 158 1.2e-02
Response to decreased oxygen levels 2.5225 160 1.2e-02
Response to oxygen levels 2.5203 161 1.2e-02
Response to chitin 2.4954 86 1.2e-02
Response to reactive oxygen species 2.4945 101 1.2e-02
Response to temperature stimulus 2.3753 348 1.5e-02
Response to water 2.2412 235 1.3e-02
Response to water deprivation 2.2077 231 1.3e-02
Regulation of DNA-templated transcription in response to stress 2.2052 21 1.1e-02

11. Co-expression network

 input_mySoftPower <- 5 #SoftPower to cutoff
 input_nGenesNetwork <- 1000    #Number of top genes
 input_minModuleSize <- 20  #Module size minimum 
 wgcna.out = wgcna()   # run WGCNA  
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq    slope truncated.R.sq mean.k. median.k. max.k.
## 1      1 0.858000  2.98000         0.8940   673.0     704.0    808
## 2      2 0.870000  1.53000         0.9030   511.0     536.0    689
## 3      3 0.813000  0.97600         0.8110   409.0     426.0    601
## 4      4 0.754000  0.67300         0.6990   338.0     349.0    533
## 5      5 0.556000  0.46400         0.4350   286.0     289.0    479
## 6      6 0.554000  0.33600         0.4270   246.0     243.0    434
## 7      7 0.375000  0.22800         0.2200   214.0     208.0    396
## 8      8 0.136000  0.12900        -0.0923   189.0     181.0    363
## 9      9 0.000381 -0.00899        -0.1430   168.0     158.0    335
## 10    10 0.035200 -0.08210         0.0194   150.0     138.0    311
## 11    12 0.139000 -0.19000         0.1860   123.0     109.0    270
## 12    14 0.233000 -0.27900         0.2630   103.0      88.2    238
## 13    16 0.361000 -0.37300         0.4380    88.1      76.1    212
## 14    18 0.478000 -0.42900         0.5820    76.2      63.5    190
## 15    20 0.529000 -0.47000         0.6330    66.9      54.2    171
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
 softPower()  # soft power curve 

  modulePlot()  # plot modules  

  listWGCNA.Modules.out = listWGCNA.Modules() #modules
 input_selectGO5 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO5,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_selectWGCNA.Module <- '1. turquoise (166 genes)' #Select a module
 input_topGenesNetwork <- 10    #SoftPower to cutoff
 input_edgeThreshold <- 0.4 #Number of top genes 
 moduleNetwork()    # show network of top genes in selected module
##  softConnectivity: FYI: connecitivty of genes with less than 4 valid samples will be returned as NA.
##  ..calculating connectivities..

 input_removeRedudantSets <- TRUE   #Remove redundant gene sets 
 results <- networkModuleGO()  #Enrichment analysis of selected module
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
1.4e-20 66 Response to abiotic stimulus
5.8e-12 44 Cellular response to chemical stimulus
5.8e-12 45 Response to oxygen-containing compound
1.6e-10 48 Response to organic substance
6.1e-09 41 Response to endogenous stimulus
1.2e-08 40 Response to hormone
1.5e-08 32 Defense response
1.6e-08 33 Response to acid chemical
8.8e-08 28 Response to inorganic substance
9.3e-08 14 Cellular response to decreased oxygen levels